Skip to content

ci: add GitHub Actions workflow to run tests on PRs#99

Open
mrshu wants to merge 1 commit intoevaleval:mainfrom
mrshu:mrshu/auto-test-trigger
Open

ci: add GitHub Actions workflow to run tests on PRs#99
mrshu wants to merge 1 commit intoevaleval:mainfrom
mrshu:mrshu/auto-test-trigger

Conversation

@mrshu
Copy link
Copy Markdown
Contributor

@mrshu mrshu commented Apr 6, 2026

Previously no CI workflow ran pytest on pull requests, so regressions could land undetected. This adds a test.yml workflow triggered on PRs and pushes to main.

  • Split into three parallel jobs: core, inspect, helm
  • Core job runs tests that need no optional dependencies
  • Inspect and HELM jobs install their respective extras
  • Follows the same uv/setup-uv pattern as regenerate_types.yml

Closes #88

@mrshu mrshu force-pushed the mrshu/auto-test-trigger branch from 7a61a37 to 9cd77ab Compare April 6, 2026 16:22
Previously no CI workflow ran pytest on pull requests, so
regressions could land undetected. This adds a test.yml
workflow triggered on PRs and pushes to main.

- Split into three parallel jobs: core, inspect, helm
- Core job runs tests that need no optional dependencies
- Inspect and HELM jobs install their respective extras
- Pin action versions to match regenerate_types.yml
- Add timeout-minutes: 10 to guard against hung jobs
- Update uv.lock

Closes evaleval#88
@mrshu mrshu force-pushed the mrshu/auto-test-trigger branch from 9cd77ab to 7f66665 Compare April 6, 2026 16:23
@mrshu
Copy link
Copy Markdown
Contributor Author

mrshu commented Apr 6, 2026

@nelaturuharsha not sure if this is what you had in mind -- happy to update it in any way you'd prefer!

@Erotemic
Copy link
Copy Markdown
Contributor

Would it make sense to install all requirements and then just run all tests? I worry that having lines like:

          uv run pytest \
            tests/test_inspect_adapter.py \
            tests/test_inspect_instance_level_adapter.py \
            -v

will not be updated if more files with relevant inspect tests get added. It would feel safer to me if the test command was just uv run pytest tests, so we are more confident everything is run. I also think that we could still split out dependencies as long as tests properly skipped themselves if their dependencies were not there.

Copy link
Copy Markdown
Contributor

@Erotemic Erotemic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current PR could be accepted as is just to get something running, but I think we should avoid the globs if possible, and I think that adding locked/loose and core/full runs gives the best bang for the buck in terms of running different flavors of CI.

Comment on lines +1 to +64
name: Tests

on:
pull_request:
push:
branches: [main]

jobs:
core:
name: Core tests
runs-on: ubuntu-latest
timeout-minutes: 10
steps:
- uses: actions/checkout@v6.0.2
- uses: astral-sh/setup-uv@v7.6.0

- name: Install core dependencies
run: uv sync --locked

- name: Run core tests
run: |
uv run pytest \
tests/test_validate.py \
tests/test_check_duplicate_entries.py \
tests/test_inspect_uuid_utils.py \
tests/test_cli_inspect_uuid.py \
tests/test_lm_eval_adapter.py \
-v

inspect:
name: Inspect converter tests
runs-on: ubuntu-latest
timeout-minutes: 10
steps:
- uses: actions/checkout@v6.0.2
- uses: astral-sh/setup-uv@v7.6.0

- name: Install dependencies with inspect extra
run: uv sync --locked --extra inspect

- name: Run inspect tests
run: |
uv run pytest \
tests/test_inspect_adapter.py \
tests/test_inspect_instance_level_adapter.py \
-v

helm:
name: HELM converter tests
runs-on: ubuntu-latest
timeout-minutes: 10
steps:
- uses: actions/checkout@v6.0.2
- uses: astral-sh/setup-uv@v7.6.0

- name: Install dependencies with helm extra
run: uv sync --locked --extra helm

- name: Run HELM tests
run: |
uv run pytest \
tests/test_helm_adapter.py \
tests/test_helm_instance_level_adapter.py \
-v
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
name: Tests
on:
pull_request:
push:
branches: [main]
jobs:
core:
name: Core tests
runs-on: ubuntu-latest
timeout-minutes: 10
steps:
- uses: actions/checkout@v6.0.2
- uses: astral-sh/setup-uv@v7.6.0
- name: Install core dependencies
run: uv sync --locked
- name: Run core tests
run: |
uv run pytest \
tests/test_validate.py \
tests/test_check_duplicate_entries.py \
tests/test_inspect_uuid_utils.py \
tests/test_cli_inspect_uuid.py \
tests/test_lm_eval_adapter.py \
-v
inspect:
name: Inspect converter tests
runs-on: ubuntu-latest
timeout-minutes: 10
steps:
- uses: actions/checkout@v6.0.2
- uses: astral-sh/setup-uv@v7.6.0
- name: Install dependencies with inspect extra
run: uv sync --locked --extra inspect
- name: Run inspect tests
run: |
uv run pytest \
tests/test_inspect_adapter.py \
tests/test_inspect_instance_level_adapter.py \
-v
helm:
name: HELM converter tests
runs-on: ubuntu-latest
timeout-minutes: 10
steps:
- uses: actions/checkout@v6.0.2
- uses: astral-sh/setup-uv@v7.6.0
- name: Install dependencies with helm extra
run: uv sync --locked --extra helm
- name: Run HELM tests
run: |
uv run pytest \
tests/test_helm_adapter.py \
tests/test_helm_instance_level_adapter.py \
-v
name: Tests
on:
pull_request:
push:
branches: [main]
permissions:
contents: read
jobs:
tests:
name: ${{ matrix.deps }} / ${{ matrix.resolution }}
runs-on: ubuntu-latest
timeout-minutes: 10
strategy:
fail-fast: false
matrix:
include:
- deps: core
resolution: locked
sync: uv sync --locked
pytest_args: >-
tests
--ignore-glob=tests/test_inspect*.py
--ignore-glob=tests/test_helm*.py
-v
- deps: core
resolution: loose
sync: uv sync --upgrade
pytest_args: >-
tests
--ignore-glob=tests/test_inspect*.py
--ignore-glob=tests/test_helm*.py
-v
- deps: full
resolution: locked
sync: uv sync --locked --all-extras
pytest_args: tests -v
- deps: full
resolution: loose
sync: uv sync --upgrade --all-extras
pytest_args: tests -v
steps:
- uses: actions/checkout@v6.0.2
- uses: astral-sh/setup-uv@v7.6.0
- name: Install dependencies
run: ${{ matrix.sync }}
- name: Run tests
run: uv run pytest ${{ matrix.pytest_args }}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the style of CI that I like to use. I also like to use a "core" or "minimal" test of tests that just target the plain library without any extras. When dealing with extras instead of worrying about diffrent combinations I like to use a "full" version where all the extras are installed. We could split out the inspect / helm versions but I think it's not worth the effort.

What I do think is worth the effort is a locked / loose install check. I find it extremely valuable to test the locked version of the deps as well as the deps that would be installed if a user pip installed today. This gives a strong signal when an upstream package causes a break in this package.

For now I think it's fine to use the --ignore-glob, but really we should just use pytest to mark which tests should / shouldn't run based on the installed packages, but the glob method is good enough for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Expand test-suite and add automatic trigger for running it on PRs

2 participants